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ABSTRACT 


Techniques  for  detecting  tactical  targets  on  Forward- 
Looking  Infrared  (FLIR)  imagery  are  being  investigated. 

The  principal  topics  covered  include  target  and  background 
models,  object  extraction  and  classification,  and  hardware 
technology  applicable  to  real-time  implementation. 
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Introduction 


| 

This  document  reports  on  the  progress  of  the  University  ; 

of  Maryland/Westinghouse  Corporation  project  entitled 
"Algorithms  and  Hardware  Technology  for  Image  Recognition" 
during  the  initial  period  May  1-July  31,  1976.  The  project 
has  two  principal  goals: 

a)  Selection  of  state-of-the-art  algorithms  for  auto- 
matic target  cueing,  and  implementation  of  one  or 

> two  selected  algorithms  in  hardware  to  demonstrate 

the  feasibility  of  incorporating  such  algorithms 
in  a reconnaissance  sensor. 

b)  Exploration  of  new  approaches  to  image  understand- 
ing, with  emphasis  on  techniques  applicable  to  tar- 
get cueing  and  similar  applications,  as  well  as  on 
image  modeling  for  performance  prediction. 


The  project  consists  of  three  phases  all  of  which  involve 
collaboration  between  the  University  and  its  subcontractor, 
the  Systems  Development  Division  of  Westinghouse . The  three 
phases  and  their  breakdown  into  tasks  are  displayed  in  the 
following  table: 

Phase  Task 

I (Task  and  technology  review) 

1)  Data  base  acquisition 

Obtain  data  bases  consisting  of  real-world 
imagery  containing  representative  target- 
background  combinations  for  selected  reconnais- 
sance sensors  and  conditions . 

2)  Review  of  tri-service  operational  needs  and  re- 
sulting system  design  const.  nts. 

Meet  with  tri-service  representatives  to  dis- 


Phase 


Task 


cuss  operational  target  detection  problems. 
Emphasis  will  probably  be  placed  on  night 
vision  and  tactical  IR  sensors.  Choice  of 
sensors  and  operational  environments  will  de- 
fine constraints  on  hardware  design. 

3)  Hardware/algorithm  interface 

Hardware  constraints  will  restrict  choice  of 
algorithms  for  implementation;  algorithm  de- 
sign will  define  requirements  on  hardware  per- 
formance. This  interaction  will  constitute  a 
continuing  aspect  of  the  Maryland/Westinghouse 
collaboration . 


(Algorithm  development  and  testing) 

4)  Algorithm  development 

-Exploration  of  new  approaches  ; 
evaluation  of  standard  approaches,  modified  as 
appropriate  for  the  given  input  data. 

5)  Algorithm  selection  and  test 

Algorithm  implementation,  feasibility  testing, 
performance  evaluation  on  selected  data  bases, 
comparison  with  current  target  cueing  systems 
performance. 

6)  Target  and  background  modelling 

Development  of  statistical  models;  estimation  of 
model  parameters  for  given  data  bases;  use  of 
models  to  predict  target  detection  performance. 

(Hardware  design,  fabrication,  and  testing) 
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In  the  first  quarter,  major  efforts  have  been  devoted 
to  a cross-sectional  study  of  the  algorithmic  steps  com- 
prising a solution  to  the  cueing  problem.  The  purpose  has 
been  to  investigate  the  inherent  complexity  of  Forward- 
Looking  Infrared  (FLIR)  imagery  and  to  identify  the  areas 
in  which  significant  contributions  to  the  state  of  the  art 
are  likely  to  be  made. 

The  current  research  effort  in  automatic  target  cueing 
consists  of  seven  project  areas: 

. Data  base  acquisition  and  preprocessing 
. Models  for  FLIR  image  understanding 
. Automatic  object  detection 
. Automatic  threshold  selection 

. Noise  region  elimination  and  component  feature  extrac- 
tion 

. Component  classification  and  target  recognition 
. Hardware  technology  for  algorithm  implementation 

In  each  project  area,  one  or  more  approaches  have  been 
studied  as  described  in  the  following  sections  of  this  re- 
port. The  Westinghouse  review  of  hardware  technology  is 
appended  to  this  report  as  a separate  volume. 


2.  Project  Review 

Al.  Data  Base  Acquisition  and  Preprocessing 

The  image  data  base  which  has  been  investigated  in 
this  report  consists  of  low  altitude  infrared  scenes  of 
tanks,  trucks  and  APC's  against  a sparsely  wooded  or  barren 
background.  The  images  were  digitized  by  the  U.S.  Army  Night 
Vision  Laboratory  (NVL)  from  video  tapes  of  the  FLIR  signal 
which  drives  the  cockpit  display.  Westinghouse  reformatted 
the  data  and  supplied  duplicate  digital  tapes.  Aside  from  a 
variety  of  noise  effects,  the  image  display  fiducial  marks 
and  numeric  situation  data.  A number  of  scenes  were  imaged 
in  complement  (negative)  format. 

In  all,  13  tapes  containing  90  scenes  were  re- 
ceived. Each  scene  image  was  present  as  a tape  file  of  800 
records  (lines)  of  1024  bytes  (pixels)  each.  The  pixels  had 
been  quantized  to  16  bits.  According  to  the  ground  truth 
supplied,  the  scenes  contained  views  of  targets  in  various 
aspects  and  at  various  ranges.  The  available  ground  truth  is 
presented  in  Table  1. 

Using  the  ground  truth,  a set  of  128x128  pixel 
windows  containing  the  identified  targets  were  examined.  In 
addition,  a number  of  windows  containing  no  targets  were 
extracted.  Among  the  latter,  a distinction  was  made  between 
those  containing  object-like  regions  ("hot  rocks")  and  those  » 
consisting  of  noise  patterns  ("noise").  The  windows  were 
further  reduced  by  sampling  to  64x64  image  points.  The  ex- 
tracted windows  were  requantized  to  64  gray  levels  by  dropp- 
ing the  low  order  two  bits.  As  may  be  seen  from  the  typical 
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Ref .No.  Tape  Frame 


Image 
Ref .No. 


Tape  Frame 


1 

A 

1 

T/S 

46 

E 

6 

A/E 

T/D 

2 

A 

2 

T/S 

47 

E 

7 

R/E 

3 

A 

3 

T/S 

R/S 

48 

E 

8 

A/E 

R/D 

4 

A 

4 

T/S 

R/S 

49 

E 

9 

* 

5 

A 

5 

* 

50 

E 

10 

T 

A 

6 

A 

6 

T/S 

R/S 

51 

F 

1 

A 

T/S 

R 

7 

A 

7 

* 

52 

F 

2 

A 

T/S 

R 

8 

A 

8 

T/S 

53 

F 

3 

A 

T/S 

R 

9 

A 

9 

T/S 

R/S 

54 

F 

4 

A 

T/S 

R 

10 

A 

10 

T/S 

55 

F 

5 

A 

T/S 

R 

11 

B 

1 

T/S 

56 

F 

6 

R 

T 

A 

12 

B 

2 

T/S 

57 

F 

7 

R 

T 

A 

13 

B 

3 

T/S 

58 

F 

8 

R 

T 

A 

14 

B 

4 

T/S 

59 

F 

9 

R 

T 

A 

15 

B 

5 

T/S 

60 

F 

10 

A/E 

16 

B 

6 

T/S 

61 

GH 

1 

A/E 

T/D 

17 

B 

7 

T/S 

62 

GH 

2 

T/S 

18 

B 

8 

R/E 
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T/D 

44 

E 

4 

A/E 

89 

HI 

14 

T/D 

45 

E 

5 

A/E 

T/D 

90 

HI 

15 

A/E 

T = tank 
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i view 

R = truck 

/E  = end 

view 

A = ape 

/D  = 3/4 

view 

* - no  target 

Table  1.  NVL  Data  Ground  Truth 
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Table  1 (continued) 
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histograms  shown  in  Figure  1,  the  original  images  exhibit 
non-uniformity  of  quantization,  and  the  information  loss 
due  to  requantization  should  be  small.  A final  preprocess- 
ing step  complemented  those  windows  which  contained  targets 
in  complement  form. 

An  assessment  of  the  data  base  reveals  a wide 
range  of  target  sizes  and  levels  of  thermal  emission.  To 

V 

the  naked  eye,  some  of  the  small  indistinct  targets,  while 
detectable,  appear  virtually  unclassif iable . The  larger 
targets  do  exhibit  characteristic  shapes,  though.  We  have 
assumed  at  this  stage  that  it  is  more  important  to  detect 
targets  at  long  range  than  to  classify  them  once  their 
shapes  are  discernible  at  closer  range.  However,  shape 
recognition  for  target  classification  will  be  studied  ex- 
tensively in  the  near  future. 

The  variability  of  the  images  and  the  large 
amount  of  noise  present  indicate  the  need  for  the  acquisition 
of  further  data  bases  to  substantiate  or  challenge  the 
assumptions  made  in  the  present  study. 
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A2.  Image  Processing  Software 

Software  development  has  progressed  in  the  imple- 
mentation of  MINIXAP , a research-oriented  picture  processing 
system  designed  for  the  PDP  11/45  computer.  Its  current 
capabilities  have  enabled  it  to  assume  some  of  the  computing 
tasks  in  processing  the  NVL  imagery  data  bases. 

Figure  2 shows  the  basic  hardware  configuration 
of  the  system.  Picture  files  are  stored  locally  on  disk, 
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Figure  2:  Picture  Processing  Hardware  Configuration 
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9-track  or  7-track  tape.  Picture  data  can  be  transferred  to 
the  PDP  11/45  from  the  mass  storage  facilities  of  a UNIVAC 
1108  computer  via  a high-speed  UNIVAC  1108  channel.  Images 
may  be  input  from  a precision  drum-type  optical  scanner, 
and  output  to  either  a CRT  monitor  or  a precision  Polaroid 
or  35mm  film  recorder.  Medium  speed  communication  lines  to 
UNIVAC  1106  and  UNIVAC  1108  computers  provide  additional 
paths  for  picture  data  transfer  and  for  program  development 
activities. 

MINIXAP  has  been  designed  in  a four-level  hierarchy. 
The  bottom  level,  written  in  PDP  11  assembly  language, 
manages  a data  base  of  picture  files  and  provides  device- 
independent I/O  of  picture  intensity  data.  The  second  level 
provides  a convenient  command  interface  to  the  bottom  level 
from  the  programming  language  LISP.  The  third  level,  written 
in  LISP,  is  a collection  of  picture  processing  algorithm 
skeletons.  An  algorithm  skeleton  is  a program-like  struc- 
ture in  which  certain  arguments  and  functions  are  left  un- 
specified until  the  skeleton  is  prepared  for  execution.  At 
that  time,  the  appropriate  arguments  and  functions  are 
associated  with  the  skeleton,  and  the  completed  program  may 
be  executed.  Many  common  image  operations  are  sufficiently 
similar  that  they  may  be  regarded  as  instantiations  of  al- 
gorithm skeletons.  Figure  3 illustrates  an  algorithm 
skeleton  for  a picture  processing  operation  which  uses  one 
input  picture,  generates  one  output  picture,  and  in  which 
the  transformation  function  is  a neighborhood  operator. 

The  Roberts  gradient  operator  is  an  example  of  a neighbor- 
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A Picture  Processing  Algorithm 
Skeleton  for  Neighborhood  Opera- 
tions. 


hood  operator  using  a 2x2  neighborhood  size.  The  fourth 
level  in  the  MINIXAP  hierarchy  is  a collection  of  applica- 
tion packages,  which  use  the  algorithm  skeletons  to  perform 
picture  processing  operations. 

Decause  of  the  interactive  nature  of  the  user 
interface  language  LISP,  the  generality  of  the  data  struc- 
tures afforded  by  LISP,  and  the  collection  of  algorithm 
skeletons  provided,  MINIXAP  should  be  a useful  tool  for  use 
in  picture  processing  algorithm  development. 

The  following  utility  routines  and  application 
packages  are  currently  available  in  MINIXAP: 

(1)  Utilities  for: 

(a)  picture  printing 

(b)  histogram  generation  and  printing 

(c)  picture  copying 

(d)  filename  manipulation  to  facilitate  the  handling 
of  large  data  bases 

(2)  Application  packages 

(a)  a cooccurrence  matrix  generator 

(b)  an  edge  detection  package,  containing  the  Roberts 
gradient,  Laplacian,  "3  by  3"  gradient,  and  "DIFF" 
operators 

(c)  a propagation  package,  containing  shrink/expand 
routines,  thinning  operators,  distance  transform 
and  skeletonization  operators,  and  border- 
following routines 


(d)  a picture  compression  package  for  block  averaging 


Thus  far,  MINIXAP  has  been  used  in  the  object 
windowing,  automatic  threshold  determination  and  noise 
cleaning  phases  of  processing  of  FLIR  imagery.  It  is  antici 
pated  that  much  of  the  future  image  processing  algorithm  de- 
velopment and  testing  for  FLIR  imagery  will  be  done  using 


MINIXAP 


The  purpose  of  an  image  model  is  to  define  and 


account  for  significant  variables  of  an  image  processing 
problem  situation.  Such  models  can  suggest  or  substantiate 
algorithmic  techniques,  predict  critical  parameter  values 
such  as  thresholds,  and  provide  performance  measures.  As 
an  initial  step,  we  have  chosen  to  model  one  aspect  of  FLIR 
imagery  based  on  the  simplified  assumption  that  targets 
appear  as  homogeneous  "hot"  regions  within  a homogeneous 
"cooler"  surround.  Operations  which  respond  to  edges  by 
assigning  high  values  also  respond  to  homogeneity  with  low 
values.  A model  which  describes  the  transition  from  back- 
ground to  object  can  be  used  to  predict  a threshold  gray 
level  for  separating  object  from  background.  In  future 
work  we  plan  to  investigate  a model  involving  the  projective 
geometry  of  the  image,  to  be  used  in  predicting  object  size 
and  orientation. 

The  model  presented  in  this  section  is  basically 
a first  approximation  to  the  real-world  situation,  since  it 
assumes  that  the  target  and  background  have  essentially  con- 
stant gray  levels  (except  for  noise) , and  that  the  edges  be- 
tween target  and  background  are  ramplike.  A more  realistic 
model  would  take  into  account  gradations  of  gray  level 
across  the  image  (e.g.,  due  to  range  or  terrain  slope  diff- 
erences) , and  would  treat  edges  as  smooth  transitions. 
(Gradations  across  the  image  may  be  unimportant  when  one 
processes  relatively  small  windows,  but  could  not  be  ignored 


when  analyzing  entire  frames.) 

In  spite  of  these  limitations,  the  model  does 
qualitatively  predict  thr  statistical  measurements  made  on 
real  images.  It  constitutes  a first  step  in  the  develop- 
ment of  more  accurate  models  that  should  provide  quantita- 
tive fits  to  real-image  data. 

1 
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In  scene  analysis  it  is  often  required  to  segment  an 
image  into  background  and  object,  where  an  object  is  a light 
area  embedded  in  a background  of  darker  gray  level  (or  vice 
versa) . A simple  segmentation  procedure  is  to  select  a gray 
level  threshold  to  discriminate  the  pixels;  pixels  with  gray 
level  higher  than  the  threshold  are  mapped  into  the  object 
class  and  the  rest  into  the  background  class.  The  optimum 
threshold  for  discrimination,  given  by  Bayes  decision  theory, 
is  the  one  that  satisfies 

p (t  | o)q)  P (u)q)  = p(t|u1)P(u1)  (1) 

where  t is  the  gray  level  threshold;  and  <joq  are  the  two 
classes  (object  and  background  respectively);  P ( • ) a.ad  p(»)  are 
a priori  and  conditional  probabilities.  When  the  com- 
ponent density  parameters  or  the  prior  probabilities  are  un- 
known, the  location  of  the  valley  between  the  two  modes  of 
the  mixture  density,  corresponding  to  the  two  modes  of  the 
component  densities  (assuming  they  are  unimodal  and  "well 
separated") , is  chosen  as  the  threshold.  But  often  such  a 
threshold  cannot  be  readily  derived  from  the  mixture  histo- 
gram. 

In  the  following  we  develop  a model  for  optimal  threshold 
selection  which  takes  more  accurate  account  of  image  struc- 
ture. 


Bl.  The  noise-free  case  in  one  dimension 

Let  Fig.  4 represent  the  spatial  gray  level  variation 

of  a one-dimensional  image.  The  background  and  object  have 

gray  levels  Sq  and  s^  and  are  spatially  connected  by  a ramp 

edge.  Let  e be  the  output  of  an  edge  operator*.  Assuming 

the  edges  on  both  sides  of  the  object  are  equally  steep,  the 

joint  (gray  level,  edge  value)  histogram  p(s,e)  will  be  as 

shown  in  Fig.  5.  The  impulse  functons  at  (sQ,0)  andts^O) 

correspond  to  the  background  and  the  object  respectively, 

the  strength  of  these  impulse  functions  being  proportional 

to  the  areas  of  the  background  and  the  object.  At  the  edge, 

the  output  of  the  edge  operator  is  maximum  (=  em)  and 

is  constant  for  all  gray  levels  at  the  edge.  For  sQ  < s < s.^ 

p(s,e  ) is  constant.  Let 
m 

L = total  length  of  the  image 
d = Is^SqI  (see  Fig.  2) 

w = width  of  edge  on  either  side  of  object  (see  Fig.  1) 
and  e(i)  ^ | s (i) -s  ( i— 1 ) | . Then 

em  ~ w (2) 


probability  P(e=e  ) = 4— 

m L 


P (e  ) 


and  P(s,em)  = 


= Ld  ' f°r  S0  < s < S1 


Substituting  eqn.  (2)  in  eqn.  (3)  we  get 


p(s-em>  " LF 


*This  output  will  be  referred  to  here,  generically,  as  the 
gradient. 


Thus 


p(s,em|s0  < s < s^  a | (4b) 

m 

where  a denotes  "is  proportional  to".  Now  the  mixture  den- 
sity of  gray  level  s is 

P ( s ) = P(<D0)  <5(s-Sq)  + P Ms-S^  + p(s,em)  (5) 

Obviously  this  is  a mixture  of  three  component  densities, 
rather  than  two  as  is  assumed  in  the  classical  approach  to 
threshold  selection.  Let  us  define  the  ramp  edge  as  class 
and  let  Wg  and  be  as  defined  in  eqn.  (1)  . Since 

6 ( s— Sq ) = p(s  |u>0)  (6a) 

5 (s-s1)  = p(s|(D1)  (6b) 

and  P(s,em)  = p(s|e=em)  P(em) 

= p(s|w2)  P(oo2)  = | ^ (6c) 

eqn.  (5)  can  be  written  as 

p(s)  = P ( u>0 ) 6 (s-sQ)  + P ( ) 6 ( s— s^ ) + P(^2)  ^ 

= P(ojg)  p(s|o)Q)  + Ptto^)  p ( s | (u^)  + P(w2)  p(s|a)2)  (7) 

which  defines  the  mixture  density.  Hence  in  segmentation, 
the  following  choices  are  available: 

(i)  Treat  the  problem  as  a 3-class  discrimination 
problem  and  extract  by  Bayes'  decision  rule. 

(ii)  Include  an  unspecified  fraction  f'  of  the  pixels 
from  <ii2  in  by  selecting  a threshold  tQ , 
s0  < tQ  < si,  such  that  tQ  discriminates  between 


1 


Uq  and  (i)^  (at  zero  gradient) 


(iii)  Select  a threshold  to  include  a specified  frac- 


tion f of  pixels  from  u>2  in 

In  choice  (i)  the  Bayes  decision  rule  gives  a set  of  piece- 
wise  linear  discriminant  functions.  In  particular  the  de- 
cision rule  is: 


s < 

ts' 

e < t => 
e 

(s,e) 

£ (i)g 

(8a) 

s > 

V 

e < te  - 

(s,e) 

£ 

(8b) 

e > 

fce 

=» 

(s,e) 

£ id2 

(8c) 

where  t 

s 

is 

any  threshold  satisfying  sQ  < tg  < s^. 

and  t is 
e 

any  threshold  satisfying  0 < te  < em.  The  above  decision 
rule  is  given  by 


(s,e)  £ <*>.  if  max  p (s,e  | to. ) P (w.  ) = k.  (9) 

K i 11 

The  three  discriminant  functions  are  shown  in  Fig.  6.  In 

choice  (ii)  the  space  ft  = {(s,e)}  may  first  be  classified 

into  and  uL  by  selecting  a threshold  t' , 0 < t'  < e . The 
z z e e m 

set  "u>2  may  then  be  classified  into  u)Q  and  u>^  by  selecting  a 
threshold  tQ,  sQ  < tg  < s^;  however,  in  the  final  decision 
rule  the  discriminant  s = tQ  is  extended  to  all  gradient 
values  (e)  and  classifies  the  entire  space  ft  into  only  two 
classes:  and  that  is,  the  decision  rule  is 

s < tQ  =»  (si)  £ (10a) 

s > tg  =»  (s^  £ (10b) 

Thus  the  pixels  {(s,e)}  | > tQ , e = em}  are  included  in  , 
Essentially  the  threshold  t is  a Bayes  classifier  for  all 


pixels  with  zero  gradient.  Hence,  the  threshold  t^  can  be 
chosen  arbitrarily  close  to  zero,  i.e.,  t^  -+  0+.  Thus  in 
choice  (ii)  the  segmentation  procedure  is  to  select  a 
threshold  tg  that  optimally  classifies  all  pixels  with 
gradient  less  than  0+ into  two  classes,  u>0  and  w^and  then  ex- 
tend the  threshold  to  all  gradients  as  in  eqn.  (10)  above. 

The  fraction  f*  of  pixels  from  uj2  included  in  can  be  easily 


shown  to  be  f = 


. _ Sl"t0 


The  discriminant  function  is  shown 


in  Fig.  7 . It  may  be  noted  that  the  threshold  t^  would 
satisfy  the  same  Bayesian  optimality  criterion  as  tg  in 
choice  (i) , and  tQ  would  satisfy  the  same  criterion  as  tg. 

In  other  words,  the  decision  rule  (8)  will  remain  unchanged 
if  te  is  replaced  by  t^  and  tg  is  replaced  by  tg.  In  choice 
(iii)  a threshold  t^  is  selected  to  classify  Si  into  u>2  and 
ix>2»  Then  the  threshold  t^  is  determined  such  that 


Pr(s  > t1|a>2)  = f. 

Now  t^  is  used  as  a gray  level  threshold  to  partition  the 
space  Si  into  and  according  to  the  rule: 

s > tj_  =»  (s,  •)  6 Uj 

s < t^  =»  (s,  •)  6 . 


If  the  class  conditional  density  of  to2  is  symmetric  then 
f = 0.5  gives  tj^  as  the  class  conditional  mean  of  u>2 . As  a 
variation  of  choice  (iii)  one  may  select  t.^  as  the  class 
conditional  mean  regardless  of  the  shape  of  the  class  con- 
ditional density  of  w2.  In  this  case  every  point  in  the 
edge  between  the  object  and  the  background  is  treated  as  a 


potential  candidate  for  the  threshold  and  the  actual 
threshold  selected  is  the  mean  value  of  all  such  candidate 
thresholds.  Thus  the  threshold  t^  is  given  by 


t = E [ s | o>2  ] 

= Ets|em], 


Hence 


/s  P(s|em)ds 
*00 


= J / 1s  ds 


s2-s2 
S1  S0 


sl"s0 


sl+s0 


which  is  not  surprising.  Clearly,  this  suggests  a method  for 
selecting  the  threshold:  choose  the  points  where  the  output 

of  the  edge  operator  is  high  (e  = em  in  the  example) ; the 
mean  gray  level  of  such  points  gives  the  threshold. 


B2 


Extension  to  two  dimensions 


In  extending  the  simple  case  of  Fig.  1 to  two- 
dimensional  space  the  following  assumptions  are  made:  the 

object  is  of  constant  gray  level  s^  and  is  convex,  the  back- 
ground is  of  constand  gray  level  sQ,  and  at  the  edge  of  the 
object  gray  levels  increase  from  sQ  at  the  background  to  s.^ 
at  the  object  monotonically  and  at  a constant  rate  (see 
Fig. 8 ) . 

The  structure  of  the  joint  histogram  of  (s,e)  in  this 
case  remains  basically  the  same  as  in  the  one-dimensional 
case  with  one  major  exception.  If  two  constant  gray  level 
contours  cQ  and  c^  are  drawn  through  the  points  in  the  edge 
region,  with  the  gray  level  of  cQ  greater  than  that  of  c^r 
then  because  of  the  shape  of  the  object,  cQ  contains  fewer 
pixels  than  c^  does.  Three  simple  observations  can  be  made 
regarding  these  contours:  the  number  of  pixels  in  a con- 

tour is  proportional  to  the  length  of  the  contour;  the  length 
of  the  contour  monotonically  increases  as  the  distance  of 
the  contour  from  the  object  (measured  in  a direction  ortho- 
gonal to  the  contour)  increases;  and  the  gray  level  of  the 
contour  decreases  linearly  as  the  distance  from  the  object 
increases.  Thus  P(s|em)  is  a monotonically  decreasing  func- 
tion from  Sp  to  s^.  In  the  simplest  case  of  circular  object 
shape  (and  circular  contours)  this  function  can  be  shown  to 
be  linear,  as  shown  in  Fig.  9 . if  the  Laplacian  is  used  as 
the  edge  operator  then  instead  of  a monotonic  function 
p(s|em)  will  be  two  delta  functions  at  s = sQ  and  s = Sj^. 


tm 


However,  in  either  case,  the  density  p(s,e)  is  still  a mixture 

of  three  components  and  the  three  decision  rules  corresponding 

to  the  three  choices  still  hold.  Thus  a threshold  can  still 

be  selected  by  taking  the  conditional  expectation  of  the  gray 

level  with  the  condition  e = e . 

m 


Let  the  noise  present  in  a scene  be  i.i.d  (independent 

identically  distributed)  with  zero  mean  normal  distribution 
2 

(variance  = a ) . The  new  gray  level  in  the  two-dimensional 
image  space  is 

x(ifj)  = s (i, j)  + n(i,j)  (12 

where  s is  the  original  gray  level  as  shown  in  Fig.  8 and 
Fig.  9,  n is  the  normally  distributed  noise,  and  x(i,j)  is 
the  gray  level  of  the  noisy  image  at  (i,j)  . Clearly  the  noise 
is  independent  of  the  three  classes  a)Q,  u^,  and  u>2 . Thus  the 
components  of  the  mixture  density  are  given  by  the  convolution 
of  the  noise  density  with  the  original  component  densities. 
Specifically 

p(x|o>0)  = p(s|u>0)  * p(n) 

* N(sq,  a2)  (13, 

pfxla^)  = pts^)  * p(n) 

* N(s1#  a2)  (131 

p(x|ai2)  = p(s|u>2)  * p(n)  (13i 

The  density  function  for  u>2,  unfortunately,  is  not  so  simple 
as  that  of  u)q  or  For  the  case  shown  in  Fig.  9 the  general 

shape  of  the  component  densities  is  as  shown  in  Fig. 10.  Thus 
it  may  not  be  feasible  to  select  the  threshold  by  locating  the 
valley,  in  the  mixture  density,  between  the  modes  correspond- 
ing to  those  of  p(x|a>0)  and  ptxlu^). 

To  get  some  insight  into  the  joint  P*d.f.  of  gray  level 


and  gradient  let  us  assume  that  the  edge  operator  is  of  the 
form 


e(i,j)  * /[x(p1)-x(p2)  ]2  + [x(p3)-x(p4)]Z  (14) 

where  p^,  p2,  P3#  and  p4  are  four  pixels  in  the  neighborhood 
of  (i,j).  Let 

y = x(p3)  - x(p2)  and  (15a) 

z = x(p3)  - x(p4) . 

2 

Thus  both  y and  z are  N(0,  2o  ) and,  of  course,  independent 
in  both  the  background  and  the  object  region.  This  is  true 
because 

x(p3)  - x(p2)  = sfp^  + n(px)  - s(p2)  - n(p2),  (16) 

and  in  the  background  as  well  as  in  the  object  region 

s(p3)  * s(p2) , (17a) 

hence 

x(p3)  - x(p2)  * n(p3)  - n (p2) • (17b) 

2 

Since  n(p3),n(p2)  are  i.i.d.  as  W(0,a  ),  n(p^)  - n(p2)  is 
2 

N(0,  2o  ) and  so  is  y.  In  the  edge  region,  however,  * 

s(pL)  + s(p2)  (18a) 

s(p3)  * s(p4) . 


(18b) 


Thus  for  Wq  and 


e(i, j) 


4^7 


where  y and  z are  independent  NfO^o^).  Therefore  e(i,j)  is 
Rayleigh  distributed  with  p.d.f. 


p(e|wn)  = p(e|w.)  * -^5-  exp[A 
u 1 2 a 40“ 


] u(e) 


where  u(*)  is  the  unit  step  function.  The  general  shape  of 
the  function  is  shown  in  Fig.  11  for  a = and  a = o2  > a^. 
The  mean  and  variance  of  the  gradient  e in  ojg  and/or  are 
easily  computed  as 


E[e|u)g]  = E[e|u^]  = /fra 


Var  [e  | u)q]  = Vartelu^]  = (4-tt)o  . (20b] 

Here  a few  observations  are  in  order.  First  of  all,  the  pre- 
sence of  noise  has  not  only  spread  the  gray  level  distribution 
in  the  otherwise  homogeneous  region  (object  and  background) 
but  has  spread  the  gradient  distribution  also.  Secondly,  the 
dispersion  of  the  gradient  in  the  otherwise  homogeneous  regions 
is  directly  proportional  to  the  noise  dispersion.  The  mean 
gradient  in  the  homogeneous  region  also  increases  with  the 
noise  dispersion.  For  the  sake  of  tractability , assuming  in- 
dependence between  gradient  and  gray  level,  the  joint  component 
densities  of  <Dg  and  are  given  by  the  products  of  two 
normal  p.d.f. 's  with  Rayleigh  p.d.f.  Both  the  component  den- 
sities are  unimodal  in  the  bivariate  space,  the  modes 
occurring  at  (Sg,  /2a^)  and  (s^,  for  Wg  and  respec- 


tively. 

The  case  of  the  edge  region,  however,  is  complicated 
by  the  inequalities  (18) . Assuming  that  in  the  neighborhood 
of  every  point  (i,j) 


stpj^)  - s(p2)  = m and 

s(p,)  ~ s(p.)  = n 


(21a) 


(21b) 


independent  of  location  (i,j),  then  y and  z of  eqn.  (15)  be- 
2 2 

come  N(m,  2a  ) and  N(n,  2o  ),  respectively.  Hence  in  the  edge 
region  the  cumulative  distribution  function  Pfe^i^)  is 


e^  2 it 


/ / l — exp[-  “ ^j(  (ecos0-m)  2+  (esin0-n)  2) ) ] eded0  (22a) 

0 0 4nrr  Art 


where  edede  is  the  differential  area  in  polar  coordinate  sys- 
tem (e,6)  and  the  integrand  in  the  square  bracket  is  the  joint 
p.d.f.  of  (y,z)  transformed  into  polar  coordinates  by 


y = ecosS 
z = esin6. 

The  probability  density  function  pfe^J^)  is  obtained  by 
differentiating  expression  (22a)  w.r.t.  e^,  which  yields 


P(ell“2)  = 


— - exp[  - -^2  (e2+m2+n2)]  • 

4no*  4o 


A ft 

/ exp  [ 


(mcos8+nsin9 ) ] d0 


In  expression  (22)  and  eqn.  (23) 

n — 2 

e_  = /m  +n  . 
m 


(24a) 


w 

Let  us  introduce  a new  variable  9 defined  by 

9 = tan-1  pj  . (24b) 

In  the  edge  region  in  the  noise-free  case  9 gives  the  direc- 
tion of  gradient,  where  em  gives  its  magnitude.  Substituting 
eqn.  (24)  into  eqn.  (23)  we  get 

2it  ee 

p(e|w,)  = _^exp[-  -^(e  +e£)l/  exp[ j sinfc>+6)ide 

* 4tto  4a  0 2a 

0 - 2ti+9  ee 

= e exp  [ =-s-(e‘1+e‘")  ] / e:;p[ ^ sinGldQ 

2 a 2 m i 

4ira^  4a  9 20 

277+9 

Denoting  the  integral  / [*]d0  by  F(9)  we  have 

9 

p (e  | <i)_)  = — ^ exp[ (e2+e2)]  F ( 4> ) . (25) 

z 4170^  40 

When  em  = o,  eqn.  (25)  will  reduce  to  eqn.  (19) . 

Thus  now  the  gradient  in  the  edge  region  is  no  longer 

constant  as  it  was  in  the  noise-free  case.  The  joint  class 

conditional  density  of  to2  is  given  by  the  product  of  eqn. 

(25)  and  eqn.  (13c) . The  mode  is  a function  not  just  of  m 

2 

and  n as  before,  but  also  of  a , the  noise  variance.  If  the 
class  conditional  density  in  the  noise-free  case  were  uniform 
(as  in  Fig.  5 ) then  the  mode  in  the  noisy  case  would  be  a 
straight  line  segment  parallel  to  the  x (gray  level)  axis. 

In  segmentation  we  still  have  the  three  choices,  (i)  , 

(ii) , and  (iii) , available  to  us  just  as  in  the  noise-free 


case,  except  that  the  corresponding  decision  rules  must  change. 
In  choice  (i)  determining  the  thresholds  corresponding  to  the 
Bayes  decision  was  trivial  in  the  noise-free  case.  For  ex- 
ample, p(e)  is  a mixture  of  two  delta  functions,  at  e = e^  and 
e = 0,  corresponding  to  p(e|u>2)  and  p(e|u>2),  respectively. 

Thus  for  all  e in  the  range  0 < e < em  we  have  p(e|ui2)  = C and 
p(e|u>2)  = 0,  and  any  such  e is  an  optimal  classifier;  the  prob- 
lem is  only  to  determine  em  from  the  noise-free  sample  pic- 
ture^), which  is  trivial.  However,  in  the  noisy  case  p(e) 
is  a mixture  of  p.d.f.'s  given  by  eqn.  (19)  and  eqn.  (25) 
which  may  look  as  shown  in  Fig.  12.  The  Bayes  discriminant 
function  is  satisfied  by  t (see  Fig. 12)  where 

P (fce 1^2 ^ = P^telw2^  p ' (26) 

but  determining  tg  now  requires  knowledge  of  the  component 
p.d.f.  parameter  values  and  a priori  probabilities.  Hence 
t cannot  be  determined  from  the  sample  picture (s)  alone.  In- 
stead one  may  locate  the  "valley"  vg  (see  Fig. 12)  in  the  mix- 
ture density  (the  mixture  density  can  be  estimated,  e.g.,  by 
histogram  p,  from  sample  picture (s) ) and  t can  be  estimated 
by  v . Thus  an  alternative  (for  the  noisy  case)  to  the  de- 
cision rule  (8)  is  to  seek  valleys  in  the  joint  gray  level 
and  gradient  histogram  of  the  sample  picture (s).  Curves 
given  by  such  valleys  (see  Fig.  13)  can  then  be  used  as  dis- 
criminants. In  choice  (ii) , similarly,  the  threshold  tQ  can 
be  obtained  by  locating  the  valley  vQ  in  the  conditional 

A 

histogram  p(x,e|e  < te)  where,  as  in  the  noise-free  case. 


w 


t0  -*■  0 . The  threshold  tg  is  then  extended  to  all  gradient 
values  to  include  an  unspecified  fraction  f'  of  the  edge 
pixels  in  as  shown  in  Fig.  14.  The  discriminant  functions 
in  this  case,  is  given  by  // 


x = v , //  (2 

u / 

a straight  line  through  (vQ,0)  and  parallel  to  the  (^-axis. 

The  fraction  f'  can  be  altered  by  changing  the  ^L6pe  of  this 
discriminant  (Fig.  15),  that  is,  by  selecting^a  straight  line 
of  the  form  . * /jr 


x-e/a  = v. 


C16 

and  by  changing  the  slope  (^)-  a one  can  include  different 
numbers  of  edge  pixels  in  oo^  (of  course,  due  to  the  presence 
of  noise,  this  process  includes  some  background  pixels  in  u>^)  . 
Section  El  uses  this  concept.  In  choice  (iii)  the  threshold 
t^  is  selected  from  the  class  conditional  density  p(x,e|fa>2). 
This  requires  isolating  u>2  from  co2  first.  In  the  noise-free 
case,  as  mentioned  earlier,  this  is  trivial  since  p(e|a)2)  as 
well  as  p(e|ui2)  are  delta  functions.  Here,  the  density  func- 
tions are  no  longer  delta  functions;  but  the  gradient  threshold 
t£,used  to  classify  ft  into  u>2  and  u>2,  can  be  selected  by 
valley  seeking  in  the  e-domain;  that  is,  the  threshold  t^  is 
estimated  by  the  valley  vfi  in  the  gradient  histogram  p(e)  of 
the  sample  picture (s).  Once  t^  is  selected,  and  consequently 
the  set  of  edge  pixels  u>2  ■ {(x,e)|e  > t£}  is  determined,  the 
gray  level  threshold  t^  can  be  determined,  as  in  the  noise- 
free  case,  either  by 


Pr(x  > tjJuij)  = f 


(29a) 


or  by 

t = E[x|u>2]  (29b) 

where  f is  the  specified  fraction  of  edge  pixels  to  be  in- 
cluded in  This  is  demonstrated  in  Fig.  16.  A problem 

arises,  however,  when  the  a priori  probabilities  in  eqn.  (26) 
are  quite  different  from  each  other  (e.g.,  P(aJ2)  >>  p(w2^ 
or  the  class  conditional  p.d.f.'sin  eqn.  (26)  are  not  well 
separated.  In  such  a case  the  valley  in  the  gradient  histo- 
gram p(e)  is  not  prominent  enough  to  be  extracted  easily. 

Hence  the  threshold  tfi  in  eqn.  (26)  cannot  be  estimated  by 
v since  v itself  is  unknown.  In  this  case  one  may  resort 
to  the  following  technique  for  estimating  tg:  choose  a frac- 

tion q and  determine  the  point  v^  such  that 

Pr(e  > Vg ) = q.  (30) 

The  quantity  v^  is  easily  estimated  from  the  gradient  histo- 
gram. The  threshold  t is  estimated  by  v'  and  is  used  to 

e e 

segment  certain  training  sample (s).  The  quantity  q is  then 
altered  to  give  a different  gradient  threshold  tfi  and  is 
again  used  to  segment  the  training  samples.  The  fraction  q 
is  changed  until  the  probabilities  of  errors  of  the  first  and 
second  kinds  (false  alarm  and  miss)  have  attained  satisfactory 
values.  The  corresponding  value  of  q estimates  t " in  sub- 
sequent test  samples.  This  is  shown  in  Fig.  17. 

It  is  planned  to  carry  out  thresholding  experiments  along 
the  lines  suggested  by  this  model  (see  Sections  D and  El  for 


: 


some  first  steps  in  this  direction) , and  to  refine  the  model 
as  necessary.  Meanwhile,  it  is  felt  that  the  model  has  pro- 
vided important  insights  into  the  use  of  local  image  property 
statistics  in  image  segmentation.  Other  approaches  along 
similar  lines  should  be  investigated. 


C.  Automatic  Object  Detection 

Given  an  image  divided  into  windows  which  may  or 
may  not  contain  targets,  it  is  necessary  to  eliminate  "noise* 
windows  which  contain  no  discernible  objects.  Subsequently, 
those  windows  containing  objects  can  be  analyzed  to  deter- 
mine whether,  in  fact,  the  objects  are  targets. 

A variety  of  techniques  are  available  for  testing 
whether  a window  is  of  sufficient  interest  to  merit  further 
processing.  The  a priori  probability  of  occurrence  of  a 
"noise"  window  depends  on  the  type  of  terrain  being  imaged. 

A hilly,  wooded  scene  might,  for  example,  contain  many 
objects,  while  a flat  desert  scene  might  consist  largely  of 
empty  windows.  Terrain  features  such  as  hillocks  and 
roads  will  also  contribute  objects  for  further  analysis. 

Later  processing  has  the  task  of  extracting  the 
objects  and  classifying  them  as  targets  or  non-targets 
based  on  shape  features,  gray  level,  size,  texture,  etc. 

This  processing  generally  requires  thresholding  to  segment 
the  scene.  Automatic  threshold  selection  is  treated  in 
Section  D.  However,  in  the  case  of  a noise  window,  choosing 
a threshold  is  not  only  futile  but  dangerous,  in  that  seg- 
mentation, by  its  nature,  will  often  find  spurious  "objects" 
in  a noise  window. 

We  are  investigating  methods  of  discovering  noise 


windows  based  on  the  spatial  distribution  of  gray  levels  in 
the  window.  Noise  windows  are  generally  homogeneous,  and 
so  the  central  moments  evaluated  on  the  whole  window  should 


have  high  values,  since  there  is  no  concentration  of  gray 
level.  On  the  other  hand,  windows  containing  objects  should 
have  lower  moment  features,  in  Table  2 are  displayed  the 
moment  values  for  10  noise  windows  and  30  target  windows. 

A classification  experiment  using  the  Fisher  linear  dis- 
criminant produced  the  following  confusion  matrix: 

Classified  as 
Noise  Target 

Noise  8 2 

Target  11  19 

A large  number  of  the  misclassified  targets  were  small,  in- 
distinct, and  located  on  uniform  backgrounds.  Inasmuch  as 
the  misclassified  targets  were  successfully  thresholded 
(see  Section  D) , it  seems  likely  that  the  statistical  uni- 
formity of  95%  of  the  window  obscured  the  hotter  target 
region. 

A second  experiment  based  not  on  the  gray  level 
image  but  on  the  output  of  a difference  operator  (absolute 
differences  of  8x8  averages)  showed  that  moment  features 
may  respond  better  to  clusters  of  edge  values  than  to 
clusters  of  gray  values.  In  this  experiment  the  confusion 
8 2 

matrix  was  (,.  25^*  Here  again,  4 of  the  5 misclassified  targets 
were  of  the  small  indistinct  variety.  (See  Tables  2c,  2d  for 
further  details.) 

Another  experiment  was  performed  to  investigate  the 
SPAN  technique  [1,  2]  as  a method  of  detecting  targets.  This 
technique  (SPAN  * Spatial  Piecewise  Approximation  by 
Neighborhoods)  examines  a set  of  neighborhoods  of  each 
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Table  2b.  Feature  values  based  on  8x8  differences 


Misclassification  Results 


Feature 

Fisher  Direction 

Target  Image  No. 

Noise  Image  No. 

Avg.  G.L. 

.316 

3R 

2 

St.  Dev.  G.L. 

-.946 

6R 

50 

x2 

.00723 

31R 

y2 

.0256 

34T 

xy 

.0706 

34R 

45T 

Threshold 

30.5 

52T 

52A 

54A 

55R 

57R 

Table  2c.  Fisher  linear  discriminant  classification 


experiment 

results  using  Table 

2a  data. 

Misclassification  Results 

Feature 

Fisher  Direction 

Target  Image 

No.  Noise  Image 

Avg.  G.L. 

.199 

8 

St.  Dev.  G.L. 

-.980 

34R 

38 

x2 

.00113 

48R 

z 

y 

.000562 

52A 

xy 

-.00350 

v t 

• " 55R 

Threshold 

-.630 

57R 

Table  2d.  Fisher  linear  discriminat  classification 
experiment  results  using  Table  2b  data. 


image  point,  and  picks  the  largest  neighborhood  that 
satisfies  some  uniformity  criterion  (see  below) . If  this 
neighborhood  is  contained  in  some  other  point's  largest 
uniform  neighborhood,  it  is  discarded.  The  result  of  this 
process  is  a set  of  irredundant,  maximal  uniform  neighbor- 
hoods that  provide  an  approximation  to  the  given  image. 

The  technique  is  described  in  greater  detail  in  [1,  2], 
Figures  18  and  19  illustrate  the  SPAN  technique 
and  its  application  to  two  FLIR  windows.  The  unifor- 
mity criterion  was  based  on  the  chi-square  test  for 
normality.  If  the  gray  level  distribution  in  a given 
neighborhood  satisfied  this  test,  the  neighborhood  was 
called  uniform  (in  the  sense  that  its  gray  level  population 
was  homogeneous) . The  maximal  uniform  neighborhoods  deter- 
mined in  this  way,  shown  in  Figure  18,  do  separate  the 
target  and  background  regions,  at  least  crudely.  The 
method  (as  currently  implemented)  is  computationally  costly, 
but  it  deserves  further  study  as  a possible  means  of 
facilitating  target  detection. 
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Figure  18a.  Chi-square  Test  for  Unimodality. 

Cell  picture:  Row  1.  Original  image;  maximal 

SPAN  radii  at  each  point; 
maximal  radii  with  local 
non-maxima  suppressed; 
detected  edges. 

Row  2.  Smoothed  image;  image  re- 
construction using  SPAN 

Chromosome:  Row  3.  Same  as  Row  1. 

' Row  4.  Same  as  Row  2. 


□ ■ ■ ■ 
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Figure  18b.  Multimodality  Test.  Same  as  Fig.  18a. 


Figure  19a.  Chi-square  Test. 

Tank  window:  Same  as  Fig.  18a,  Rows 

1 and  2 . 

APC  window:  Same  as  Fig.  18a,  Rows 

3 and  4 . , 


Figure  19b.  Same  as  Fig.  19a  but  with  enhanced, 
contrast.  Additional  reconstruc- 
tions (Rows  2 and  4)  using  MAX,  MIN 
and  average  recombination  rules. 


D. 


Automatic  Threshold  Selection 


In  Section  B a model  was  proposed  for  images  con- 
sisting of  objects  and  background,  each  with  characteristic 
gray  level  distributions.  If  the  gray  level  histogram  of 
the  image  is  markedly  bimodal,  one  may  choose  the  threshold 
at  the  valley  between  the  two  peaks  (possibly  shifted  towards 
the  smaller  peak  when  using  a maximum  likelihood  estimate) . 
However,  as  may  be  seen  in  Figure  20,  the  smaller  the  object, 
the  less  likely  the  histogram  is  to  exhibit  strong  bimodality. 
The  background  distribution  engulfs  the  object's  gray  level 
range  and  tests  for  bimodality  are  inconclusive. 

Our  approach  [ 3 ] to  solving  this  problem  has  been  to 
select  from  the  original  image  a set  of  points  that  are  as 
likely  to  fall  within  the  object  as  within  the  background. 

If  one  examines  the  output  of  operators  which  respond  to 
edges,  then  high  values  should  correspond  to  points  falling 
at  or  near  object  edges.  These  points  are  as  likely  to 
lie  on  the  object  as  on  the  background  and  their  mean  value 
should  correspond  to  the  desired  threshold.  At  first  it 
was  thought  that  the  distribution  of  such  points  for  a 
sufficiently  coarse  detector  would  be  bimodal.  However, 
the  model  of  the  previous  section  has  shown  the  distribu- 
tion to  be  unimodal  with  a peak  at  the  mean. 

A number  of  edge  operators  were  tried  in  connec- 
tion with  this  approach.  Figure  21  shows  edge  values  for 
the  following  operators: 


1 


Figure  20.  156  Windows  - sampled  to  64  X 64. 

Each  window  has  a corresponding  histogram  in 
which  grid  lines  identify  intervals  of  100  image 
points.  Image  reference  numbers  refer  back  to 
ground  truth. 
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Figure  21  (continued) 


Figure  21  (continued) 


Figure  21  (continued) 
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Figure  21  (continued) 
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Figure  21  (continued) 


Figure  21  (continued) 


Figure  21  (continued) 
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Figure  21  (continued) 


Figure 


21  (continued) 


Laplacian:  |e  - (a+b+c+d+f+g+h+i) /8 | , where  the 

neighborhood  of  e is 

a b c 
deft 
g h i u 
v w 

Roberts  Gradient:  max{|a-e|,  |b-d|}.  i 

Three-by-three : max{ | a+b+c+-g-h-i | , | a+d+g-c-f-i | } 

2x2  Difference:  1 

l/4*max{ |d+e+g+h-f-t-i-u| , |b+c+e+f-h-i-v-w| } . 

1 

(In  other  words,  the  value  corresponds  to  the 

maximum  of  the  differences  between  2x2  aver- 

■ ) . ‘ * ' • 

ages  over  adjacent  pairs  of  horizontal  and 
vertical  neighborhoods.) 


4x4  Difference:  This  is  the  same  as  the  2x2  diff- 

erence except  that  averages  are  taken  over  4x4 
neighborhoods . 


8x8  Difference:  The  same  as  the  previous  except 

that  averages  are  taken  over  8x8  neighborhoods. 


In  order  to  select  high  edge  values,  three  per- 
centiles were  chosen  — 80%,  90%  and  95%.  Figure  21  also 
illustrates  the  masks  consisting  of  points  whose  edge 
values  were  in  the  top  20%,  10%  and  5%,  respectively.  The 
gray  levels  (in  the  original  images)  at  these  points  were 
histogrammed  and  the  means  and  modes  tabulated  (Table  3) . 
The  mean  values  were  used  as  thresholds  on  the  original 
images.  Figure  22  shows  the  resulting  thresholded  images. 


LJ 


' Table  3.  Gray  level  statistics  for  points  of  high 
edge  value  based  on  80th , 90th , and  95th 
percentiles  for  five  edge  operators.  ("OP" 
codes  represent  in  order t Laplacian, 

Roberts  gradient,  3x3  gradient,  4x4  averages 
difference,  8x8  averages  difference. 

IRN  refers  to  the  image  reference  number  in 
the  NVL  data  base.) 
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Figure  22.  Thresholded  images,  using  as  threshold 

the  mean  gray  level  of  the  points  in  the 
80th  edge  value  percentile,  for  the 
following  edge  detectors: 
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Figure  22  (continued) 
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Figure  22  (continued) 


Fiyure  22  (continued) 


Discussion 


A number  of  parameters  were  introduced  in  this  study, 
including  the  choice  of  edge  operator,  and  the  percentile 
cutoff.  The  Laplacian  gave  poor  results  due  to  the  "rampi- 
ness"  of  the  edges.  The  gradients  produced  reasonable  edge 
masks  but  predicted  different  thresholds  (i.e.,  different 
means  of  high  edge  value  gray  levels) . The  4x4  operator 
produced  good  thresholds  overall.  This  may  be  explained  by 
noting  that  the  ramp  width  as  determined  from  the  Roberts 
gradient  was  approximately  3.  Thus  operators  based  on  3x3 
neighborhoods  or  smaller  could  not  span  the  ramp  and  give 
accurate  gradient  values.  The  4x4  difference,  on  tne  other 
hand,  does  span’ the  ramp  edge  more  effectively.  If  the 
neighborhood  is  too  big,  as  in  the  8x8  difference,  the  edge 
operator  will  be  biased  in  favor  of  the  background  (since  the 
object  shape  is  usually  convex)  and  will  on  occasion  span  the 
whole  object  (thus  decreasing  its  response  at  the  actual  edge) . 

The  choice  of  a percentile  cutoff  for  the  edge  values 
is  much  more  difficult  to  assess.  Clearly,  if  an  image 
contains  no  object  then  no  percentile  should  be  chosen  since 
the  operator  is  responding  only  to  noise.  The  use  of  a cut- 
off assumes  that  the  highest  edge  values  will  correspond  to 
object/background  edges  rather  than  to  noise.  This  is  valid 
in  the  case  of  the  difference  of  averages  operators,  since 
these  operators  will  not  respond  as  well  to  local  noise  as 
they  do  to  object  edges.  The  cutoff  should  therefore  be  at 
the  edge  value  likely  to  separate  out  almost  all  of  the 
object  edge  values.  Clearly,  the  optimal  value  depends  on 


target  size.  However,  in  practice,  the  80th  percentile  of 

K 

the  4x4  operator  produced  a reliable  sample  which  contained 
somewhat  more  background  edge  points  than  obiect  points,  re- 
sulting in  a low  threshold;  but  this  was  deemed  acceptable  be- 
cause of  the  subsequent  noise  cleaning  process  which  tended 
to  smooth  tattered  object  boundaries. 

Overall,  this  automatic  thresholding  technique  pro- 
vided reasonable  thresholds  and  produced  good  segmentations 
for  later  processing.  Computationally,  the  procedure  in- 
volved three  passes  over  the  input  image  and  one  pass  over 
the  intermediate  edge  image.  During  the  first  image  pass, 
the  edge  operator  is  applied  and  an  edge  image  created. 
Simultaneously,  an  edge  value  histogram  is  computed.  The 
80th  percentile  value  is  then  compared.  During  the  second 
pass,  both  the  input  image  and  the  edge  image  are  read. 

The  gray  level  values  of  those  edge  points  whose  edge  values 
are  at  or  above  the  80%  cutoff  value  are  histogrammed.  The 
mean  of  the  histogrammed  points  serves  as  the  threshold  for 
the  third  pass.  In  a dynamic  environment,  producing  thirty 
images  per  second,  it  should  be  possible  to  apply  the  three 
different  passes  in  pipeline  fashion  to  three  consecutive 
images  (assuming  that  the  gray  level  statistics  remain 
stable  over  the  period  necessary  to  process  three  images) . 

The  storage  requirements  would  be  reduced  to  the  number  of 
lines  necessary  to  compute  the  operator.  This  dynamic 
approach  will  be  tested  with  a real-time  sequence  of  images 
during  the  next  quarter. 


ll 


An  alternative  computational  approach  determines  the 
threshold  in  a single  pass  over  the  input  image,  at  the  cost 
of  storing  a 2-D  histogram  within  an  array  of  counters.  If 
the  gray  level  value  at  the  current  image  point  is  i and 
the  edge  operator  is  j,  then  the  (i,j)th  counter  is  in- 
cremented. Now,  high  edge  values  correspond  to  high-index 
rows  in  the  2-D  histogram  array.  The  row  sums  form  the 
edge  value  histogram,  whose  80th  percentile  is  then  chosen. 
Next,  the  column  sums  are  formed  for  all  rows  at  or  above 
the  80%  cutoff.  These  sums  constitute  the  gray  level  histo- 
gram for  high  edge  values.  The  mean  of  this  histogram  is 
the  desired  threshold. 


El.  Edge  Reinforcement  Prior  to  Noise  Cleaning 

Noise  cleaning  operations,  in  particular,  parallel 
shrink/expand  algorithms,  will  delete  points  from  ragged 
edges  of  objects.  The  result  is  that  an  object  will  be  dis- 
played with  fewer  points  than  were  proposed  by  the  threshold- 
ing step.  One  way  of  avoiding  this  is  to  choose  a slightly 
more  generous  threshold,  thus  adding  in  more  background 
points  which  presumably  are  later  rejected  during  noise 
cleaning.  Unfortunately,  this  strategy  adds  in  noise  points 
all  over  the  image.  A technique  which  adds  only  points  at 
or  near  the  boundary  of  objects  is  preferable.  Such  points 
generally  have  high  gradient  value.  Section  B suggested  the 
following  technique:  compute  the  threshold  automatically 

as  described  in  Section  D,  and  use  a combined  (gray  level, 
gradient  value)  threshold  to  include  high  gradient  value 
points  which  don't  quite  exceed  the  gray  level  threshold 

value.  In  practice,  this  may  be  implemented  using  the  2-D 
(gray  level,  gradient)  his  toy  rain;  such  histograms  are  shown 

in  Figure  23.  On  such  a histogram,  a vertical  line  cor- 
responds to  a gray  level  threshold,  while  an  obliyue  line 

corresponds  to  a combined  (gray  level,  gradient  value) 
threshold.  Image  points  whose  (gray  level,  gradient  value) 
pair  lies  to  the  left  of  BC  are  considered  to  be  above 
threshold.  Figure  24  illustrates  this  for  several  values  of 
6.  Note  that  the  effect  of  varying  0 is  mainly  the  accre- 
tion or  loss  of  edge  points.  Clearly,  the  implementation  of 
a combined  threshold  can  be  accomplished  in  the  single 


Figure  23. 


Two-dimensional  (gray  level,  gradient) 
histograms.  The  displayed  histogram 
value  at  row  R and  column  C is  the  (log 
scaled)  number  of  image  points  which 
have  edge  value  R and  gray  level  C. 


Key:  Hx  denotes  the  two-dimensional  histogram  for 

edge  detector  x. 
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Figure  23  (continued) 


47R 


48R 


21A 


27A 


Figure  23  (continued) 


Figure  23  (continued) 
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Figure  24  (continued) 


Figure  24  (continued) 


thresholding  pass.  Further  study  of  the  model  is  needed 
to  predict  the  appropriate  value  of  0 for  this  type  of 
thresholding  procedure. 


The  images  in  the  data  set  are  oversampled  at  a 
ratio  of  over  2:1  for  the  purpose  of  scaling  the  horizontal 
and  vertical  axes.  The  processed  windows,  however,  were 
sampled  down  2 to  1.  The  resulting  windows  exhibit  moderate 
to  severe  high  frequency  noise.  An  effort  was  made  to  re- 
duce this  noise  by  producing  windows  based  on  2x2  averaging 
rather  than  sampling . Thus , instead  of  discarding  every 
other  row  and  column,  each  pixel  in  the  sampled  image  was 
the  average  over  a (disjoint)  2x2  neighborhood  in  the 
original  image.  The  results  (Figure  25)  show  that  a smooth, 
less  noisy  image  was  produced  and  that  row  dropouts  were 
partially  eliminated.  However,  the  images  seemed  to  have 
less  contrast.  Further  experiments  will  determine  if  aver- 
aged windows  should  replace  the  sampled  windows. 
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Figure  25.  Sampling  vs.  averaging  in  target  windows. 

a.  'f’he  computed  average  using  truncation. 

b.  Same,  using  rounding. 


E3.  Noise  Region  Filtering  by  Simultaneous  Local 

Operations 

The  result  of  thresholding  a new  (unsmoothed) 
image  is  a binary  valued  image  with  object  points  identified 
(nominally)  by  the  value  1 and  background  points  by  the 
value  0.  In  general,  one  often  obtains  isolated  points  or 
small  regions  corresponding  to  spurious  identifications  in 
both  the  background  and  object  regions  (see  Figure  22) . 

Such  regions,  which  will  be  called  noise  regions,  are  recog- 
nizable by  their  small  size  and  their  isolation,  especially 
since  it  has  been  postulated  that  object  regions  are  com- 
pact. Note  that  these  noise  regions  are  artifacts  of  the 
thresholding  and  may  not  be  readily  visible  in  the  original 
image . 

One  method  of  eliminating  noise  regions  is  to 
preprocess  the  image  by  smoothing.  Preprocessing  algorithms 
based  on  blurring  and  median  filtering  will  be  studied 
more  extensively  in  the  next  quarter.  Another  approach  is 
to  pcstprocess  the  thresholded  image  to  delete  the  noise 
regions.  This  section  discusses  postprocessing  techniques 
which  eliminate  small  and/or  non-compact  regions  from  a 
thresholded  image. 

The  method  which  was  investigated  consists  of 
multiple  applications  of  two  processes  — "shrink"  and 
"expand" -- for  example,  two  shrinks  followed  by  two  expands. 
The  purpose  of  the  sequence  of  shrinks  is  to  shrink  objects 
in  a uniform  manner  so  that  small  or  insubstantial  objects 
disappear  entirely.  The  sequence  of  expands is  meant  to 


regrow  the  remaining  shrunken  objects  to  their  original 
size.  The  result  of  the  shrinks/expands  is  the  elimination 
of  small  regions  (presumed  to  be  noise  regions) . 

Each  shrink  or  expand  requires  the  simultaneous 
or  "parallel"  application  of  a local  replacement  rule  at 
every  point  of  the  thresholded  image.  This  means  that  all 
transform  decisions  are  made  on  untransformed  data,  as  dis- 
tinguished from  the  sequential  application  of  the  trans- 
formation rule  in  a raster  fashion  with  transformed  point 
values  replacing  the  original  values  as  they  are  computed. 

The  form  of  the  shrink  rule  is  as  follows:  Rewrite 

each  1 as  0 if  any  (at  least  one)  of  its  neighbors  is 
already  0.  Zero  values  are  unchanged.  (The  4-neighbor 
case  treats,  as  neighbors,  points  horizontally  or  vertically 
adjacent  to  the  given  point;  the  8-neighbor  case  includes 
the  points  diagonally  adjacent  as  well.)  Such  a rule  de- 
creases the  number  of  l's  in  the  thresholded  image;  thus, 
the  image  "shrinks".  The  rule  can  be  interpreted  as 
eliminating  all  l's  adjacent  to  0's.  In  fact,  only  l's 
surrounded  by  l's  will  survive  a shrink.  Two  shrinks 
applied  in  succession  will  eliminate  all  l's  at  a distance 
of  two  or  fewer  raster  units  (city  block  or  chessboard 
distance)  from  the  nearest  0.  The  number  of  successive 
shrinks  determines  the  minimum  diameter  of  a region  for  it 
to  survive;  e.g.,  one  shrink  eliminates  all  objects  with 
diameter  two  or  less;  two  shrinks  eliminate  objects  with 
diameters  of  four  or  less. 


The  expand  rule  Is  similar  to  the  shrink  rule: 
rewrite  a 0 as  a 1 if  any  of  its  neighbors  are  l's,  but 
leave  l's  unchanged.  Thus  points  adjacent  to  l's  become 
l's,  thereby  increasing  the  number  of  l's.  If  we  wish  to 
restore  objects  (that  are  not  eliminated)  to  about  their 
original  sizes,  t shrinks  should  be  followed  by  t expands. 
Such  a shrink/expand  sequence  produces  an  image  whose  l's 
correspond  to  (a  subset  of  the)  l's  in  the  untransformed 
binary  image.  Thus,  for  example,  isolated  l's  are 
eliminated,  and  objects  joined  by  narrow  necks  of  l's  may 
become  disconnected.  Also,  thin  protrusions  from  a region 
of  l's  will  disappear.  Figure  26  illustrates  the  shrink/ 
expand  algorithm  for  both  the  4 and  8 neighbor  cases  and 
t = 1,  2,  3 (the  numbers  of  shrinks  and  expands  used) . 
Figure  27  shows  the  effect  that  the  choice  of  edge  operator 
in  threshold  selection  has  on  the  subsequent  noise  cleaning. 

The  shrink  rule  as  formulated  was  unsatisfactory  be- 
cause it  tended  to  delete  too  much;  it  tended  to  produce 
regions  all  of  the  same  shape  (diamond-shaped) ; and  it  did 
not  fill  in  pinholes.  A generalization  of  the  shrink  rule 
was  formulated  as  follows:  delete  a 1 if  at  least  k of  its 


neighbors  are  0's  (zeros  remain  unchanged).  The  original 
shrink  rule  corresponds  now  to  k = 1.  The  generalized 
shrink  is  more  conservative  in  that  if  k > 1,  it  takes  more 
zero  evidence  to  convert  a 1 to  0.  The  generalized  expand 
is  analogously  defined:  Rewrite  a 0 as  1 if  it  has  at 

least  k l's  as  neighbors  (ones  remain  unchanged).  Note 


Figure  26.  Effects  of  iterating  SHRINK/EXPANDS  (S/E's). 


a.  Original  images  - each  column  is  a single 
image  thresholded  at  four  different  values. 

b.  4 -neighbor  rule  - one  S/E 

c.  4-neighbor  rule  - two  S/E's 

d.  4-neighbor  rule  - three  S/E's 

e.  8-neighbor  rule  - one  S/E 

f.  8-neighbor  rule  - two  S/E's 

g.  8-neighbor  rule  - three  S/E's 


that  for  increased  k,  the  generalized  expand  rule  is  not 
quite  as  generous  in  providing  new  1 values.  However,  it 
does  fill  pinholes  in  sufficiently  large  regions.  Figure 
28  provides  a comparison  for  t * 1,  2 and  k = 1,  2,  3. 
Figure  29  presents  a further  comparison  based  on  the  4x4 
edge  operator  used  in  threshold  selection.  It  appears  from 
these  examples  and  from  Figure  26  that  the  shrink/expand 
rule  with  t = 2 and  k * 3 applied  to  each  image  point  and 
its  8-neighbors  provides  efficient  noise  cleaning  with  most 
noise  regions  eliminated,  pinholes  filled,  and  only  a 
modest  amount  of  target  shape  distortion. 


Figure  28.  Leniency  in  SHRINK/EXPAND  definitions 


for  windows  thresholded  by  two  methods. 
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Figure  29.  SHRINK/EXPAND  of  thresholded  images  based  on 


four  edge  operators  (ROB, 3x3, 4x4  DIFF, 
8x8  DIFF)  and  k = 1,2,3. 
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Figure  29  (continued) 
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Figure  29 
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Figure  29  (continued) 
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Figure  29  (continued) 


E4 . Connected  Component  Analysis  and  Feature 
Extraction 


The  result  of  thresholding  is  a binary  image. 


Noise  cleaning  operations  filter  this  image,  but  it  still 


remains  to  aggregate  points  into  identified  (labelled) 


regions.  A process  which  labels  the  individual  disjoint 


regions  in  the  binary  image,  in  a single  raster  scan,  is 


well  known  in  the  literature  [4] . It  is  described  briefly 


in  the  following  paragraphs. 


A set  of  l's  in  a binary  image  is  connected  if 


any  two  points  in  it  can  be  joined  by  a path  (sequence)  of 


pairwise  adjacent  points  lying  in  the  set.  A maximal  con- 


nected set  is  called  a connected  component . The  algorithm 


to  be  described  produces  the  (unique)  decomposition  into 


connected  components  and  labels  the  individual  components 


(Figure  30) . 


As  each  line  of  the  binary  image  is  processed  in 


turn,  it  is  converted  into  a list  of  sequences  (runs)  of 


l's.  This  list  is  compared  term  for  term  with  the  list 


for  the  previous  line.  Clearly  any  run  in  the  current 


line  which  is  adjacent  to  (lies  underneath)  a run  in  the 


previous  line  belongs  to  the  same  component  as  that  pre- 


vious run.  Each  current  run  which  is  adjacent  to  a pre- 


vious run  receives  the  label  associated  with  that  previous 


run.  If  it  is  adjacent  to  several  previous  runs  with  diff- 


erent labels  then  it  is  given  one  of  those  labels,  and  an 


entry  is  made  in  a label  equivalence  table  indicating  that 


these  separate  runs  of  the  previous  line  lie  in  the  same 
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Figure  30.  Components  1 
solid  gray  1 
displayed  as 


component.  If  a current  row  is  adjacent  to  no  previous 


runs,  it  is  given  a new  (unused)  label.  When  the  current 
runs  have  been  labeled,  the  scan  is  advanced  to  the  next 
line.  Once  the  final  line  has  been  processed,  each  point 
has  been  associated  with  a single  component  (possibly  in- 
volving equivalent  labels  in  the  label  equivalence  table) . 
A second  pass  can  now  be  used  to  relabel  each  point  with  a 
unique  component  label,  as  illustrated  in  Figure  30. 


As  a binary  window  is  being  segmented,  one  can 
also  process  the  original  gray  scale  window  (since  they  are 
in  register) . Various  statistics  based  on  geometry  and 
gray  level  can  be  extracted  for  each  component  segment  and 
accumulated  during  the  pass.  The  label  equivalence  table 
can  then  be  used  to  combine  the  statistics  for  component 
segments  which  belong  to  the  same  component.  The  set  of 
components  along  with  the  features  of  each  for  each 
window  were  ordered  by  component  size  and  stored  in  files 
for  later  use  in  classification  studies.  The  features 
evaluated  and  stored  for  each  component  are: 

1.  Area  (number  of  image  points) 

2.  Average  gray  level 

3.  Standard  deviation  of  gray  level 


4.  (Average  component  gray  level) - (Average  back- 
ground gray  level) 


. Discrimination  and  Classification 

Given  a set  of  features  for  objects  known  to  lie 
in  disjoint  classes,  one  desires  a classification  rule  based 
on  the  known  features  which  will  assign  each  object  to  its 
respective  class.  Various  procedures  are  known  for  propos- 
ing such  rules.  The  Fisher  linear  discriminant  (cf.  [5  ]) 
attempts  to  find  the  optimum  linear  projection  of  the 
feature  vectors  onto  a line,  and  the  optimum  partition  of 
this  line,  such  that  the  ratio  of  between-class  scatter  to 
within-class  scatter  is  maximized. 

An  implementation  of  the  Fisher  algorithm  was 
used  (as  described  in  Section  C)  to  discriminate  noise  win- 
dows from  object-bearing  windows.  In  this  section,  we  de- 
scribe experiments  in  classifying  the  components  extracted 
in  Section  E4  into  target/non-target  classes. 

Using  the  13  features  described  in  Section  E4 , an 
optimal  linear  classifier  was  trained  on  30  target  components 
and  59  noise  components  located  in  the  30  target  windows. 

When  the  training  set  was  used  as  a test  set,  two  of  the 
targets  and  three  of  the  noise  regions  were  misclassif ied. 
Attempts  to  omit  features  from  the  classification  resulted 
in  lower  scores.  For  example,  in  trying  to  assess  the  im- 
portance of  gray  level  and  size,  features  2,  3,  4,  and  9 were 
deleted.  The  resultant  misclassif ications  consisted  of  four 
targets  and  five  noise  regions  (see  Tables  4a,  4b) . 

In  Section  2C,  the  need  for  an  early  object  detection 


phase  was  stated.  If  non-object-bearing  windows  are 
thresholded  and  segmented  into  components,  many  noise  regions 


L 


Fisher  Direction 
.384-01 
-.559 
-.285 
-.771-01 
-.250 
.519 
.287-01 


.170 


.471 

.348-03 

.103-02 

-.355-01 


-.116 


Threshold 


-.282+01 


Target  misclassifications:  Image  ref.  nos.  57Rr  52A 


Table  4a.  Fisher  linear  discriminant  results 
on  30  targets  and  59  noise  regions 
using  features  1-13. 


Feature 

Fisher  Direction 

1 

.155-01 

5 

-.483 

6 

-.358 

7 

.311-01 

8 

-.231 

10 

.763 

11 

.135-03 

12 

.151-02 

13 

.441-01 

Threshold 

-.615+01 

Target  misclassifications : Image  ref.  nos.  3IR, 

55R,  57R,  52A 


Table  4b.  Fisher  linear  discriminant  results 
on  30  targets  and  59  noise  regions 
using  features  1,5-8,10-13. 


are  generated  which  may  be  misclassif ied  as  targets,  thus 
increasing  the  false  alarm  rate.  An  experiment  verified 
the  need  for  a separate  detection  phase.  In  this  experi- 
ment when  the  noise  windows  were  segmented  and  classified 
with  the  target  windows  (the  noise  windows  contributing 
only  noise  components) , 28  of  30  targets  were  recognized 
while  106  of  114  noise  components  were  correctly  identified. 
Thus  of  38  objects  identified  as  targets,  8 are  false 
alarms  (see  Table  4c) . 

These  experiments,  although  indicating  some  degree  of 
success,  were  run  on  windows  chosen  by  human  observers  as 
having  only  moderate  amounts  of  noise.  In  unpreprocessed 
images,  the  likelihood  of  a noise  window  may  be  far 'greater 
than  the  likelihood  of  an  object  window,  and  so  the  object 
detection  phase  false  alarm  rate  for  noise  windows  (in  this 
case,  2 out  of  10)  is  a crucial  parameter.  The  classifica- 
tion aspect  of  this  project  must  be  broadened  in  every  way 
— a larger  data  base,  more  informed  feature  selection, 
and  a better  classification  strategy  (e.g.,  Bayes)  — both 
in  the  object  detection  phase  and  in  the  component  classifi- 
cation phase. 


Feature 

Fisher  Direction 

1 

.305-01 

2 

-.580 

3 

-.341 

4 

-.141 

5 

-.144 

6 

.489 

7 

.148 

8 

-.240 

9 

.377 

10 

-.674-04 

11 

.979-03 

12 

-.105 

13 

-.185 

Threshold 

-.426+01 

Target  misclassif ications : Image  ref.  nos.  57R,  52A 


Table  4c.  Fisher  linear  discriminant  results 
on  30  targets  and  114  noise  regions 
using  features  1-13. 


3*  Plans  for  the  next  quarter 
A.  Data  Sets 

The  algorithms  investigated  up  to  now  have  been 
tested  on  a data  set  consisting  of  40  images,  selected  from 
a larger  data  base  of  90  frames  containing  137  targets,  as 
described  in  Section  2.  More  extensive  tests,  involving 
the  entire  data  base,  are  planned.  An  additional  data  base 
has  just  been  obtained,  and  it  too  will  be  used  in  future 
experiments.  It  is  planned  to  acquire  at  least  two  further 
data  bases  which  will  also  provide  test  data.  The  use  of 
multiple  data  bases  will  serve  as  a check  on  the  generality 
of  both  algorithms  and  image  models.  One  of  the  data  sets 
to  be  acquired  will  consist  of  real-time  sequences  of 
frames,  and  this  will  make  it  possible  to  study  temporal  as- 
pects of  the  target  detection  process,  including  tracking 
of  targets  from  frame  to  frame. 


B.  Models 


A first-approximation  model  for  target  segmenta- 
tion, based  on  histogramming  the  joint  occurrences  of  edge 
values  and  gray  levels  in  an  image,  has  been  developed,  as 
described  in  Section  2B.  This  model  has  suggested  a number 
of  segmentation  strategies  involving  classification  in  edge/ 
gray  level  space,  rather  than  pure  thresholding  or  pure 
edge  detection.  These  strategies  need  to  be  investigated. 
Also,  other  types  of  models  based  on  local  property  co- 
occurrences should  be  formulated  and  studied.  It  is  expec- 
ted that  this  work  will  lead  to  an  increased  understanding 
of  the  image  segmentation  and  target  detection  problem. 


- 


C.  Windows 

The  experiments  performed  during  the  past  quarter 
have  employed  square  image  windows  which  may  or  may  not  con- 
tain targets.  The  distance  to  the  ground  area  covered  by 
such  a window  depends  on  the  position  of  the  window  within 
the  frame  (as  well,  as  on  the  attitude  and  altitude  of  the 
sensor) . Normally,  windows  near  the  top  of  a frame  will 
show  more  distant  parts  of  the  terrain,  while  those  near  the 
bottom  will  show  closer  parts,  so  that  targets  will  appear 
smaller  near  the  top  than  near  the  bottom.  The  radiation 
reaching  the  sensor  from  a window  also  depends  on  distance. 
This  information  can  and  should  be  used  in  choosing  para- 
meter values  for  the  algorithms  that  are  applied  to  a win- 
dow. 


E.  Classifiers 

» . 

In  the  experiments  carried  out  thus  far,  a simple 
Fisher  linear  discriminant  classifier  has  been  used.  It  is 
planned  to  investigate  the  advantages  of  more  powerful  (e.g., 
maximum-likelihood)  classifiers.  In  particular,  the  trade- 
off between  false  alarm  and  false  dismissal  rates  will  be 
explored.  Sequential  decision  procedures,  e.g.,  decision 
trees,  will  also  be  investigated.  In  this  connection,  it  is 
planned  to  make  use  of  the  Maryland  Interactive  Pattern 
Analysis  and  Classification  System  (MIPACS)  in  the  Labor- 
atory for  Pattern  Analysis  at  the  University,  which  provides 
a wide  range  of  tools  for  classifier  design. 
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